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Many complex networks display a mesoscopic structure with groups of nodes sharing many links 
with the other nodes in their group and comparatively few with nodes of different groups. This 
feature is known as community structure and encodes precious information about the organization 
and the function of the nodes. Many algorithms have been proposed but it is not yet clear how they 
should be tested. Recently we have proposed a general class of undirected and unweighted bench- 
mark graphs, with heterogenous distributions of node degree and community size. An increasing 
attention has been recently devoted to develop algorithms able to consider the direction and the 
weight of the links, which require suitable benchmark graphs for testing. In this paper we extend 
the basic ideas behind our previous benchmark to generate directed and weighted networks with 
built-in community structure. We also consider the possibility that nodes belong to more commu- 
nities, a feature occurring in real systems, like social networks. As a practical application, we show 
how modularity optimization performs on our new benchmark. 
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I. INTRODUCTION 

Complex systems are characterized by a division in 
subsystems, which in turn contain other subsystems in a 
hierarchical fashion. Herbert A. Simon, already in 1962, 
pointed out that such hierarchical organization plays a 
crucial role both in the generation and in the evolution 
of complex systems [1 . Many complex systems can be 
described as graphs, or networks, where the elementary 
parts of a system and their mutual interactions are nodes 
and links, respectively "3^. In a network, the subsys- 
tems appear as subgraphs with a high density of internal 
links, which are loosely connected to each other. These 
subgraphs are called communities and occur in a wide 
variety of networked systems [H |5] . Communities reveal 
how a network is internally organized, and indicate the 
presence of special relationships between the nodes, that 
may not be easily accessible from direct empirical tests. 
Communities may be groups of related individuals in so- 
cial networks [H |6] , sets of Web pages dealing with the 
same topic [7], biochemical pathways in metabolic net- 
works [HI [9], etc. 

For these reasons, detecting communities in networks 
has become a fundamental problem in network science. 
Many methods have been developed, using tools and 
techniques from disciplines like physics, biology, applied 
mathematics, computer and social sciences. However, 
there is no agreement yet about a set of reliable algo- 
rithms, that one can use in applications. The main rea- 
son is that current techniques have not been thoroughly 
tested. Usually, when a new method is presented, it is ap- 
plied to a few simple benchmark graphs, artificial or from 
the real world, which have a known community structure. 
The most used benchmark is a class of graphs introduced 
by Girvan and Newman [4 . Each graph consists of 128 
nodes, which are divided into four groups of 32: the prob- 



abilities of the existence of a link between a pair of nodes 
of the same group and of different groups are and 
Pouti respectively. This benchmark is a special case of 
the planted i-partition model [10]. However, it has two 
drawbacks: 1) all nodes have the same expected degree; 
2) all communities have equal size. These features are 
unrealistic, as complex networks are known to be charac- 
terized by heterogeneous distributions of degree [21 [3l [11] 
and community sizes [9] [HI [131 HH US] • a recent pa- 
per [T6^, we have introduced a new class of benchmark 
graphs, that generalize the benchmark by Girvan and 
Newman by introducing power law distributions of de- 
gree and community size. Most community detection al- 
gorithms perform very well on the benchmark by Girvan 
and Newman, due to the simplicity of its structure. The 
new benchmark, instead, poses a much harder test to 
algorithms, and makes it easier to disclose their limits. 

Most research on community detection focuses on the 
simplest case of undirected and unweighted graphs, as 
the problem is already very hard. However, links of net- 
works from the real world are often directed and carry 
weights, and both features are essential to understand 
their function JT71 [18]. Moreover, in real graphs com- 
munities are sometimes overlapping [9 , i. e. they share 
vertices. This aspect, frequent in certain types of sys- 
tems, like social networks, has received some attention 
in the last years [151 [HI IlQl [H] . Finding communities 
in networks with directed and weighted edges and possi- 
bly overlapping communities is highly non-trivial. Many 
techniques working on undirected graphs, for instance, 
cannot be extended to include link direction. This im- 
plies the need of new approaches to the problem. In any 
case, once a method is designed, it is important to test it 
against reliable benchmarks. Since the new benchmark of 
Ref. [16] is defined for undirected and unweighted graphs, 
we extend it here to the directed and weighted cases. For 
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any type of benchmark, we will include the possibility to 
have overlapping communities. Sawardecker et al. have 
recently proposed a different benchmark with overlap- 
ping communities where the probability that two nodes 
are linked grows with the number of communities both 
nodes belong to [22 . 

Our algorithms to create the benchmark graphs have a 
computational complexity which grows linearly with the 
number of links and reduce considerably the fluctuations 
of speciflc realizations of the graphs, so that they come as 
close as possible to the type of structure described by the 
input parameters. We use our benchmark to make some 
testing of modularity optimization [23 , which is well de- 
flned in the case of directed and weighted networks [24] , 

In Section |TI| we describe the algorithms to create the 



new benchmarks. Tests are presented in Section III Con- 
clusions are summarized in Section HVl 



II. THE BENCHMARK 

We start by presenting the algorithm to build the 
benchmark for undirected graphs with overlaps between 
communities. Then we extend it to the case of weighted 
and directed graphs. 

A. Unweighted benchmark with overlapping nodes 

The aim of this section is to describe the algorithm to 
generate undirected and unweighted benchmark graphs, 
where each node is allowed to have memberships in more 
communities. The algorithm consists of the following 
steps: 

1. We flrst assign the number Vi of memberships 
of node z, i.e. the number of communities the 
node belongs to. Of course, if each node has 
only one membership, we recover the benchmark 
of Ref. [TF; in general we can assign the num- 
ber of memberships according to a certain distri- 
bution. Next, we assign the degrees {ki} by draw- 
ing random numbers from a power law distribu- 
tion [25] with exponent ri. We also introduce the 

topological mixing parameter jit'- k^i^^ = (1 — /^t)ki 
is the internal degree of the node z, i. e. the num- 
ber of neighbors of node i which have at least one 
membership in common with i. In this way, the in- 
ternal degree is a flxed fraction of the total degree 
for all the nodes. Of course, it is straightforward 
to generalize the algorithm to implement a differ- 
ent rule (one can introduce a non linear functional 
dependence, individual mixing parameters, etc.). 

2. The community sizes {s^} are assigned by draw- 
ing random numbers from another power law with 
exponent r2. Naturally, the sum of the commu- 
nity sizes must equal the sum of the node mem- 
berships, i. e. ^^^^ — ^i^i' Furthermore 



Smax max{5^} ^ and Urnax max{i/J ^ Tie, 
where N is the number of nodes and Uc the number 
of communities. At this point, we have to decide 
which communities each node should be included 
into. This is equivalent to generating a bipartite 
network where the two classes are the Uc commu- 
nities and the TV nodes; each community ^ has 
links, whereas each node has as many links as its 
memberships Ui (Fig. [T]). The network can be eas- 
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FIG. 1: Schematic diagram of the bipartite graph used to 
assign nodes to their communities. Each node has as many 
stubs as the number of communities it belongs to, whereas 
the number of stubs of each community matches the size of 
the community. The memberships are assigned by joining the 
stubs on the left with those on the right. 

ily generated with the conflguration model [26 . To 
build the graph, it is important to take into account 
the constraint 
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where the sum is relative to the communities in- 
cluding node i. This condition means that each 
node cannot have an internal degree larger than 
the highest possible number of nodes it can be con- 
nected to within the communities it stays in. We 
perform a rewiring process for the bipartite network 
until the constraint is satisfled. For some choices of 
the input parameters, it could happen that, after 
some iterations, the constraint is still unsatisfled. 
In this case one can change the sizes of the com- 
munities, by merging some of them, for instance. 
It turns out that this is not necessary in most sit- 
uations and that, when it is, the perturbations in- 
troduced in the community size distributions are 
not too large. In general, it is convenient to start 
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with a distribution of community sizes such that 
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So far we assigned an internal degree to each node 
but it has not been specified how many hnks should 
be distributed among the communities of the node. 
Again, one can follow several recipes; we chose the 
simple equipartition /ci(^) = k^i^^ /vi^ where ki{£) 
is the number of links which i shares in community 
^, provided that i holds membership in ^. Some 
adjustments may be necessary to assure 



which is the strong version of Eq. [l] 
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3. Before generating the whole network, we start gen- 
erating Tic subgraphs, one for each community. In 
fact, our definition of community ^ is nothing but a 
random subgraph of nodes with degree sequence 
{^2(0)5 which can be built via the configuration 
model, with a rewiring procedure to avoid multiple 
links. Note that Eq. [2]is necessary to generate the 
configuration model, but in general not sufficient. 
For one thing, we need ^e even. This 
might cause a change in the degree sequence, which 
is generally not appreciable. Once each subgraph 
is built, we obtain a graph divided in components. 
Note that because of the overlapping nodes, some 
components may be connected to each other, and 
in principle the whole graph might be connected. 
Furthermore, if two nodes belong simultaneously 
to the same two (or more) communities, the proce- 
dure may set more than one link between the nodes. 
A rewiring strategy similar to that described below 
suffices to avoid this problem. 

4. The last step of the algorithm consists in adding 
the links external to the communities. To do this, 
let us consider the degree sequence {/c-^^^^}, where 

simply k\^^^^ = ki — k^^^^ = fitki. We want to insert 
randomly these links in our already built network 
without changing the internal degree sequences. In 
order to do so, we build a new network of 
nodes with degree sequence {/c-^^^^}, and we per- 
form a rewiring process each time we encounter 
a link between two nodes which have at least one 
membership in common (Fig. |2|, since we are sup- 
posed to join only nodes of different communities 
at this stage. Let us assume that A and B are in 
the same community and that they are linked in 
g{e-xt).^ we pick a node C which does not share any 
membership with A, and we look for a neighbor of 
C (call it D) which is not neighbor of B. Next, we 
replace the links A — B and C — D with the new 
links A — C and B — D. This rewiring procedure 
can decrease the number of internal links of Q^^^'^^ 
or leaving it unchanged (this happens only when 
B and D have one membership in common) but 
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FIG. 2: Scheme of the rewiring procedure necessary to build 
the graph Q'^^^'^\ which includes only links between nodes of 
different communities. (Top) If two nodes {A and B) with 
a common membership are neighbors, their link is rewired 
along with another link joining two other nodes C and D, 
where C does not have memberships in common with A, and 
D is a neighbor of C not connected to B. In the final con- 
figuration (bottom), the degrees of all nodes are preserved, 
and the number of links between nodes with common mem- 
berships has decreased by one (since A and B are no longer 
connected), or it has stayed the same (if B and D, which are 
now neighbors, have common memberships). 



it cannot increase it. This means that after a few 
sweeps over all the nodes we reach a steady state 
where the number of internal links is very close to 
zero (if no node has ki ~ A", the internal links of 
g{ext) j^g|. ^ g^^^ sweep is sufficient). 
Fig. [3] shows how the number of internal links de- 
creases during the rewiring procedure. Finally, we 
have to superimpose Q^^^^^ on the previous one. 

In our previous work about benchmarking [16 , we dis- 
cussed the dispersion of the internal degree around the 
ffxed value k\^^\ In this case, if the number of internal 
links of goes to zero, the only reason not to have a 

perfectly sharp function for the distribution of the mix- 
ing parameters of the nodes in speciffc realizations of the 
new benchmark is a round-off problem, i.e. the problem 



4 



of rounding integer numbers. 
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FIG. 3: Number of internal links of as a function of the 

rewiring steps. The network has 1000 nodes, and an average 
degree (k) = 50. Since the mixing parameter is /xt = 0.8 
and there are 10 equal-sized communities, at the beginning 
each node has an expected internal degree in A;|*^'* = 

0.8 * 50 * 1/10 = 4, so the total internal degree is around 
4000. After each rewiring step, the internal degree either 
decreases by 2, or it does not change. In this case, less then 
2100 rewiring steps were needed. 

Other benchmarks, like that by Girvan and Newman, 
are based on a similar definition of communities, ex- 
pressed in terms of different probabilities for internal and 
external links. One may wonder what is the connection 
between our benchmark and the others. It is not difficult 
to compute an approximation of how the probability of 
having a link between two nodes in the same community 
depends on the mixing parameter jHf. 

In the configuration model, the probability to have a 
connection between nodes i and j with ki and kj links 
respectively is approximately pij = provided that 

ki <C 2m and kj <C 2m. If the approximation holds, 
our prescription to assign /ci(^) allows us to compute the 
probability that i and j get a link in the community ^: 
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where 2m^ = number of internal links 

in the community (we recall that Ui is the number of 
memberships of node i). If i and j share a number Uij 
of memberships and all the respective Pij{^) are small, 
the probability that they get a link somewhere can be 
approximated with the sum over all the common com- 
munities. The final result is 



Pij iit)'^kikj 



^2m^^ 



(4) 



where (2^)^ = 1/2^^, and ^ runs only over 

the common memberships of the nodes. 

On the other hand, if i and j do not share any mem- 
bership, the probability to have a link between them is: 
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where 2m^^^^^ = ^^^^^^ = j^t Xli ^i is the number of 
external links in the network. The equation holds only if 
the rewiring process does not affect too much the prob- 
abilities, i.e. if the communities are small compared to 
the size of the network. 

These results are based on some assumptions which are 
likely to be not exactly, but only approximately valid. 
Anyway, carrying out the right calculation is far from 
trivial and surely beyond the scope of this paper. 

We conclude this section with a remark about the com- 
plexity of the algorithm. The configuration model takes 
a time growing linearly with the number of links m of the 
network. If the rewiring procedure takes only a few iter- 
ations, like it happens in most instances, the complexity 
of the algorithm is 0{m) (Fig. |4|. 
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FIG. 4: Computational time to build the unweighted bench- 
mark as a function of the average degree. We show the results 
for networks of 1000 and 5000 nodes, /it was set equal to 0.1 
and 0.5 (the latter requires more time for the rewiring pro- 
cess). Note that between the two upper lines and the lower 
ones there is a factor of about 5, as one would expect if com- 
plexity is linear in the number of links m. 



B. Weighted networks 

In order to build a weighted network, we first generate 
an unweighted network with a given topological mixing 
parameter /i^ and then we assign a positive real number 
to each link. 
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To do this we need to specify two other parameters, (3 
and jiw. The parameter [3 is used to assign a strength Si 
to each node, si = /cf ; such power law relation between 
the strength and the degree of a node is frequently ob- 
served in real weighted networks [T8^ . The parameter fi^) 
is used to assign the internal strength s'f^^ = (1 — /i^^) 5^, 
which is defined as the sum of the weights of the links 
between node i and all its neighbors having at least one 
membership in common with i. The problem is equiv- 
alent to finding an assignment of m positive numbers 
{wij} such to minimize the following function: 



value. With our procedure the value of Var({K;^j}) 
decreases at least exponentially with the number 
of iterations, consisting in sweeps over all network 
links. (Fig. [5|. 
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Note that these expressions 



can be related to the mixing parameters in a simple way 
(Fig. lei): 



Var({^^^}) = V(si-pi)^+(4 
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Here Si and ^^^^'^^^^ indicate the strengths which we 



would like to assign, i.e. Si = /cf , s-*^^ = (1 — /i^) s^, 
^(on^) _ i^^i ^j^^ total, internal and external 

strengths of node i defined through its link weights, i.e. 
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^{hj))^ where the function ti{i^j) = 1 if nodes i and j 
share at least one membership, and i^ii^j) = otherwise. 

We have to arrange things so that 5^ and ^^^^'^^^^ are 
consistent with the {p*}. For that we need a fast algo- 
rithm to minimize VdiT{{wij}). We found that the greedy 
algorithm described below can do this job well enough for 
the cases of our interest. 



1. At the beginning Wij 
zero. 



0, Vi, j, so all the {p*} are 



2. We take node i and increase the weight of each of its 
links by an amount Ui = , where pi indicates 
the sum of the links' weights resulting from the 
previous step, i. e. before we increment them. In 
this way, since initially {p*} = 0, the weights of the 
links of i after the first step take the (equal for all) 
value 1^, and pi = Si by construction, condition 
that is maintained along the whole procedure. We 
update {p*} for the node i and its neighbors. 

3. Still for node i we increase all the weights Wij by an 
if = 1 and by an amount 
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if = 0. Again we update {p*} 

for the node i and its neighbors. These two steps 
assure to set the contribute of node i in \d,i{{wij}) 
to zero. 

4. We repeat steps (2) and (3) for all the nodes. Two 
remarks are in order. First, we want each weight 
Wij > 0; so we update the weights only if this condi- 
tion is fulfilled. Second, the contribute of the neigh- 
bors of node i in VdiT{{wij}) will change and, of 
course, it can increase or decrease. For this reason, 
we need to iterate the procedure several times until 
a steady state is reached, or until we reach a certain 
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FIG. 5: Value of Var({^^;^j}) (Eq. [gJ after each update. Each 
point corresponds to one sweep over all the nodes. 

Since VdiT{{wij}) decreases exponentially, the number 
of iterations needed to reach convergence has a slow 
dependence on the size of the network so it does not 
contribute much to the total complexity, which remains 
0(m) (Fig.jT]). 



C. Directed networks 

It is quite straightforward to generalize the previous al- 
gorithms to generate directed networks. Now, we have an 
indegree sequence {yi} and an out degree sequence {zi} 
but we can still go through all the steps of the construc- 
tion of the benchmark for undirected networks with just 
some slight modifications. In the following, we list what 
to change in each point of the corresponding list in Sec- 
tion [TiAl 

1. We decided to sample the indegree sequence from 
a power law and the outdegree sequence from a S- 
distribution (with the obvious constraint yi = 
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FIG. 6: The average weight of an internal hnk for a node 
depends on its degree according to Eq.|7| The correlation plot 
in the figure, relative to a network of 5000 nodes, confirms the 
result. 




FIG. 7: Computational time to build the weighted bench- 
mark as a function of the average degree. We show the results 
for networks of 1000 and 5000 nodes, fit was set equal to 0.2 
and fiu) to 0.4. 



Zi). We need to define the internal in- and out- 
degrees yi{C) and Zi(^) with respect to every com- 
munity ^, which can be done by introducing two 
mixing parameters. For simplicity one can set them 
equal. 

2. It is necessary that Eq.|2] holds for both {yi] and 

3. We need to use the configuration model for directed 
networks, and the condition that should 



be even is replaced by = be- 

cause of this condition it might be necessary to 
change yi{C) and/or ^i(^). We decided to modify 
only (^), whenever necessary. 



4. The rewiring procedure can be done by preserving 
both distributions of indegree and out degree, for 
instance, by adopting the following scheme: before 
rewiring, A points to B and D io C] after rewiring, 
A points to C and D to B. 



In order to generate directed and weighted networks, 
we use the following relation between the strength Si of 
a node and its in- and outdegree: Si = {yi -\- Zi)^. Given 
a node i, one considers all its neighbors, regardless of the 
link directions (note that i may have the same neighbor 
counted twice if the link runs in both directions). Oth- 
erwise, the procedure to insert weights is equivalent. 

In directed networks, the directedness of the links may 
reflect some interesting structural information that is not 
present in the corresponding undirected version of the 
graph. For instance there could be flows, represented 
by many links with the same direction running from one 
subgraph to another: such subgraphs might correspond 
to important classifications of the nodes. Our directed 
benchmark is based on the balance between the numbers 
of internal and external links, and it does not seem suit- 
able to generate graphs with flows. However, this is not 
true: flows can be generated by introducing proper con- 
straints on the number of incoming and outgoing links of 
the communities. 

Suppose we want to generate a network with two com- 
munities only, where the nodes of community 1 point 
to nodes of community 2 but not vice versa and there 
are a few random connections among nodes in the same 
community. We could use our algorithm in this way: 
first we build separately the two subgraphs; then we set 
^(e^E^) ^ Q nodes in the community 1 and z^^^*^ 
for nodes in community 2 and build 5^^^^). If there are 
more communities, one first builds as many subgraphs as 
necessary and then links them according to the desired 
flow patterns. 

Methods based on mixture models [27l |28] may de- 
tect this kind of structures. Methods based on a balance 
between internal and external links, like (directed) mod- 
ularity optimization may have problems. For example 
(Fig. [s]), consider a network with three communities A, 
B^ C, with 10 nodes in each community, each node with 
3 in-links and 3 out-links on average; nodes in A point 
to 2 nodes in 5, nodes in B point to 2 nodes in C, and 
nodes in C point to 2 nodes in A; each node points to 
1 node in its own community. The modularity of this 
partition is Q = 0, therefore the optimization would give 
a different partition, as the maximum modularity for a 
graph is usually positive. 
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FIG. 8: Example of directed graph with a flow running in a 
cycle between three groups of nodes. The directedness of the 
links enables to distinguish the three groups, and there are 
methods able to detect them. Standard community detection 
methods, instead, are likely to fail. For instance, the value of 
the directed modularity for the partitions in the three groups 
is zero, whereas the maximum modularity for the graph is 
positive and corresponds to a different partition. 



III. TESTS 
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FIG. 10: Test of modularity optimization on our benchmark 
for weighted undirected networks without overlaps between 
communities. The topological mixing parameter /it equals 
the strength mixing parameter /Jw Each point corresponds 
to an average over 100 graph realizations. 



Here we present some tests of community detection 
methods on our benchmark graphs. We focused on two 
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FIG. 9: Test of modularity optimization on our benchmark 
for directed networks. The results get worse by increasing the 
number of nodes and/or decreasing the average degree, as we 
had found for the undirected case in Ref. T6^. Each point 
corresponds to an average over 100 graph realizations. 

techniques: modularity optimization, because it is one of 
very few methods that can be extended to the cases of 
directed and weighted graphs [24 ; the Clique Percolation 
Method (CPM) by Palla et al. [9 , a popular method to 
find community structure with overlapping communities. 
The optimization of modularity was carried out by using 
simulated annealing [8]. 



To measure the similarity between the built-in modular 
structure of the benchmark and the one delivered by the 
algorithm we adopt the normalized mutual information, 
a measure borrowed from information theory [29]. We 
stress that other choices for the similarity measure are 
possible (for a survey, see [31]) and that we use the nor- 
malized mutual information for two main reasons: 1) it is 
regularly used in papers about community detection, so 
one has a clear idea of the performance of the algorithms 
by looking at the results, compared to similar plots; 2) 
it has been recently extended to the case of overlapping 
communities [15], whereas most other measures have no 
such extension. 

Fig. [9] shows the result for the directed (unweighted) 
benchmark graphs, without overlapping communities. 
The plot shows a very similar pattern as that observed 
in the undirected case [16] . 

For the weighted benchmark (still without overlapping 
communities) we can tune two parameters, /i^ and /i^^. 
Fig. 10 refers to networks where we set jUt = fiw^ while in 
Fig. 11 we set jut = 0.5. Since, for /i^ < 0.5, jut is smaller 



for the networks of Fig. 10 than for those in Fig. [TT] we 
would expect to see better performances of modularity 
optimization in Fig. 10 in the range < jj.^ < 0.5. In- 



stead, we get the opposite result. The reason is that the 
links between communities carry on average more weight 
when jj^t < than when jHf = /i^, and this enhances 
the chance that mergers between small communities oc- 
cur, leading to higher values of modularity Because 
of such mergers, the partition found by the method can 
be quite different from the planted partition of the bench- 
mark. 



In Figs. 12 and 13 we show the results of tests per- 
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Mixing parameter |i^=0.5 




FIG. 11: Test of modularity optimization on our benchmark 
for weighted undirected networks. The topological mixing 
parameter fit — 0.5. All other parameters are the same as in 
Fig. |lQ| Each point corresponds to an average over 100 graph 
realizations. 



FIG. 13: Test of the Chque Percolation Method (CPM) by 
Palla et al. [9 on our benchmark for undirected and un- 
weighted networks with overlapping communities. The net- 
works have 5000 nodes, the other parameters are the same 
used for the graphs of Fig. |12| 




0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 

Fraction of Overlapping Nodes 



agrams community sizes range between Smin = 10 and 
Smax = 50, whereas in the bottom diagrams the range 
goes from Smin = 20 and Smax = 100. By comparing the 
diagrams in the top with those in the bottom we see that 
the algorithm performs better when communities are (on 
average) smaller. The networks used to produce Fig. 12 
consist of 1000 nodes, whereas those of FigJlS] consist of 
5000 nodes. From the comparison of Fig. |12| with Fig. [13] 
we see that the algorithm performs better on networks of 
larger size. 



IV. SUMMARY 



FIG. 12: Test of the Clique Percolation Method (CPM) by 
Paha et al. [9^ on our benchmark for undirected and un- 
weighted networks with overlapping communities. The plot 
shows the variation of the normalized mutual information be- 
tween the planted and the recovered partition, in its general- 
ized form for overlapping communities 15 , with the fraction 
of overlapping nodes. The networks have 1000 nodes, the 
other parameters are n = 2, r2 = 1, {k) — 20 and kmax — 50. 



formed with the CPM on our benchmarks with overlap- 
ping communities. In this case, the mixing parameter /nt 
is fixed and one varies the fraction of overlapping nodes 
between communities. We have run the CPM for differ- 
ent types of /c-cliques {k indicates the number of nodes of 
the clique), with k = 3,4,5,6. In general we notice that 
triangles {k = 3) yield the worst performance, whereas 
4- and 5-cliques give better results. In the two top di- 



In this paper we have introduced new benchmark 
graphs to test community detection methods on directed 
and weighted networks. The new graphs are suitable ex- 
tensions of the benchmark we have recently introduced 
in Ref. [16] , in that they account for the fat-tailed distri- 
butions of node degree and community size that are ob- 
served in real networks. Furthermore we have equipped 
all our new benchmark graphs with the option of hav- 
ing overlapping communities, an important feature of 
community structure in real networks. With this work 
we have provided researchers working on the problem of 
detecting communities in graphs with a complete set of 
tools to make stringent objective tests of their algorithms, 
something which is sorely needed in this field. We have 
developed and carefully tested a software package for the 
generation of each class of benchmark graphs, all of which 
can be freely downloaded [32^. 
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